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1.    Introduction 

Consider  the  following  informal  problem:  there  are  a  large  number  of  peo- 
ple (or  processing  units),  each  knows  n  numbers  a^.a^,  ,a^.  They  all  wish  to 
compute  the  sum  of  these  numbers.  If  they  cannot  communicate,  there  is  no 
way  to  avoid  sequential  (n(n)  time)  summation  by  each  person  separately.  On 
the  other  hand,  it  is  shown  in  the  paper  that  with  only  one  communication  chan- 
nel (one  cell  of  shared  memory)  this  time  can  be  reduced  to  0{-JrL).  With  n 
(resp.  2")  shared  memory  cells  the  time  can  be  reduced  further  to  O(logr? ) 
(resp.  0(1)).  This  exemplifies  that  a  communication  facilit.y  is  essential  for  any 
utilization  of  parallelism,  and  that  its  size  directly  affects  the  performance  of 
the  algorithm. 

The  size  of  the  common  memory  required  by  a  given  parallel  algorithm  will 
be  determined  by  two  principal  factors. 

(a)  Input  availability. 

The  size  of  the  input,  in  the  case  that  the  input  is  placed  in  the  common 
memory,  or  the  need  to  transfer  input  data  in  the  case  that  the  input  is  ini- 
tially distributed  among  the  local  memories. 

(b)  Cooperation  between  processors. 

The  transmission  of  intermediate  results  between  processors,  utilized  to 
obtain  fast  processing  time. 
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Here  we  propose  to  concentrate  on  point  (b).  For  this  reason  we  put  the 
input  in  a  "read  only"  common  memory. 

In  this  paper  we  will  concentrate  on  parallel  RAMs  (PRAMs).  In  particular, 
the  Concurrent-Read  Concurrent-Write  PRAM  (CRCW  PRAM)  and  the  Concurrent- 
Read  Exclusive-Write  PRAM  (CREW  PRAM).  Both  models  are  precisely  defined  in 
section  2.  In  the  above  models  processors  communicate  via  a  sheired  memory. 
Therefore  the  size  of  the  communication  facility  of  the  machine,  here  called 
comTnunicalion  width  (or  width  in  short)  is  simply  the  number  of  shared 
memory  cells.  We  consider  the  width  m  a  resource,  together  with  the  size  p 
(the  number  of  processors)  and  the  depth  T  (the  running  time),  and  we  seek 
trade-ofTs  between  the  three. 

One  of  the  subtleties  in  proving  lower  bounds  for  these  models,  is  that  infor- 
mation may  be  communicated  by  the  fact  that  no  processor  writes  into  a  com- 
mon memory  cell.   We  introduce  a  novel  technique  to  deal  with  this  difficulty. 

For  a  large  class  of  functions,  which  includes  Parity  and  Majority,  we  prove 
r=n(Vn7rn)  on  the  CRCW  PRAM,  where  n  is  the  size  of  the  input.  This  lower 
bound  is  tight  for  all  values  of  width  m  =  0(n/log^).  This  is  the  first  time  non- 
trivial  tight  lower  bounds  are  achieved  for  a  model  that  allows  concurrent  write 
access.  The  only  known  lower  bound  on  the  CRCW  PRAM  model  is  given  in  Stock- 
meyer  and  Vishkin  [SV-B2].  They  show,  using  a  result  of  Furst,  Saxe.  and  Sipser 
[FSS-81].  that  it  is  impossible  to  compute  parity  in  this  model  in  constant  time 
using  a  polynomial  number  of  processors.  There  is.  however,  a  large  gap 
between  this  lower  bound  and  the  best  upper  bound  known  for  a  polynomial 
number  of  processors,  which  is  O(log  n/loglog  n).  (See  [CSV-82]). 

For  another  class  of  functions,  which  includes  the  functions  AND  and  OR.  we 
prove  a  lower  bound  of  T=^{{-n./ m.Y'''^)  on  the  CREW  PRAM.  This  lower  bound 
extends  the  n(log  n)  of  Cook  and  Dwork  for  small  values  of  m.  and  further  dis- 
cerns the  power  of  CRCW  PRAM  from  the  CREW  PRAM.  At  this  point  we  state,  and 
give  a  proof,  of  a  new  result  by  Beame  that  achieves  a  tight  lower  bound  for  com- 
puting the  OR  in  this  model.  For  a  different  class  of  functions  (  that  include  OR) 
he  proves  7'=n(Vn/m. ). 

Both  our  lower  bounds  hold  regardless  of  the  number  of  processors,  while 
the  upper  bounds  are  achieved  with  the  smallest  possible  number  of  processors. 

Our  study  of  values  of  m  which  are  smadler  than  input  size  requires  us  to 
add  a  read  only  input  tape   to  the  model,   as   is  done   in  the  study  of  space 


bounded  Turing  machines.  The  interest  in  those  values  is  not  solely  theoretical  - 
it  is  well  founded  in  practice.  For  example  the  "Ethernet"  can  be  considered  as  a 
PRAM  with  only  one  shared  memory  cell.  Also,  the  papers  Gottlieb  et  al. 
[GGKMRS-82].  Kuck  [K-77]  and  Vishkin  [V-82]  imply  that  minimizing  the  size  of 
shared  memory  (that  can  be  accessed  in  parallel)  may  amount  to  hardware 
feasibility  of  the  parallel  machine. 

The  paper  is  organized  as  follows:  precise  definitions  and  the  lower  bounds 
are  given  in  section  2.  Section  3  contains  the  upper  bounds  and  section  4  con- 
cludes the  paper  and  suggests  further  research  directions.  To  improve  the  rea- 
dability of  section  2,  some  of  the  proofs  were  defered  to  the  appendix. 

2.   Lower  Bounds 

In  the  first  subsection  we  give  precise  definitions  of  the  models  of  computa- 
tion when  the  communication  width  77i  =  l,  and  of  the  types  of  functions  we  are 
interested  in.  Subsections  2  and  3  contain  the  lower  bound  proofs  for  the 
concurrent-write  and  exclusive- write  models  respectively  when  m  =  l.  In  the  last 
subsection  we  show  how  to  extend  the  lower  bounds  for  arbitrary  communica- 
tion width. 

2.1.   Definitions 

DefiniUon  2.1:  A  CRCW  PRAM(l)  consists  of  a  set  R  =  {pi.pz.  j  of  processors,  a 

number  n  of  inputs,  n  read-only  input  cells  X{l),X(2),  ,X{n),  one  common 

memory  cell  C.  an  alphabet  E  and  an  execution  Imne  T. 

Each  processor  Pi  has  a  set  of  states  9i  and  functions 
P<:9i -♦!  1.2.  •  •     ,71  j  -  the  next  input  cell  to  be  read, 
CTt:ft-»E  -  the  symbol  to  be  written  into  C,  and 
(5i:9iXExS-*ft  -  the  state  transition  function. 

At  each  time  period  f  =0,1,  ,T  each  processor  p^  is  in  a  state  qi^Qx,  and 

the  cell  C  contains  a  symbol  s'eE.    At  time  t  =0  the  input  cell  X{i)  contains  the 
input  Xi{eE)  (l<i^n),  the  cell  C  contains  a  designated  symbol  boeT.,  and  every 
processor  Pj  is  in  an  initial  state  q^^^eQ^.  In  general 
qt*'=Si{qlX(j).s')  where  j=A(g.'),  and 

0 


Is*    if  for  every  i    ai(7i''')=s'  (no  one  writes) 
<'i(gi  "■').  "i  is  the  smallest  s.t.   a,(gj"^')^s' 


The  value  /(x,,X2,  ■■     ,3^)  of  the  function  /    computed  by  the  PRAM(l)  is 
the  contents  s^  of  C  at  time  T. 

DenniUoD  2.2:  A  CREW  PRAM(l)  is  defined  exactly  like  the  CRCW  PRAM(l).  with 

only  one  exception  -  at  each  time  period  (  there  can  be  at  most  one  processor 

that  writes,  i.e.  at  most  one  i  s.t.  Oi{qi^^)*s* . 

Remarks:  These  lower  bound  models  allow  H,  E  and  ^  to  be  infinite,  and  allow 

the  processors  be  non-uniform  (i.e.  have  different  programs  for  different  values 

of  n).  Also  note  that  we  use  the  convention  that  a  processor  ivrites  if  it  tries  to 

change  the  contents  of  C,  (and  in  the  CRCW  it  must  be  the  one  with  the  smallest 

serial  number  doing  so). 

Definition  2.3:   Let  S   be   a   set,   IcS"    and  /  :  I  -»  E   some   function.   An  input 

z=Xi,X2,  ,Xn^\    is    said    to    be    k-sensitive    w.r.t    /     if    for    every    subset 

JZ-ll.Z,  ,nj,  \J\=k-\    there    exists    another    input    y=y\,yz,  >Vn^I    s.t. 

Xj=yj  for  all  j^J,  and  f  {x)itf  (y).  If  A:  is  the  largest  integer  s.t.  every  input 

(resp.  some  input)  xel  is  A: -sensitive  w.r.L  /,  then  /  is  said  to  be  k  -seTisitive 

everywhere  (resp.  k —sensitive  someiuhere). 

Examples:  Consider  the  functions  Parity,  Majority,  OR  :  fO.lj"  -»  jO,  Ij. 

Parity  is  n-sensitive  everywhere. 

Majority  is  rn./ 2i-sensitive  everywhere. 

OR  is  only  1-sensitive  everywhere  for  all  7i,  but  it  is  n-sensitive  somewhere  (the 

all  zeros  input  is  n -sensitive  w.r.t.  OR). 

2.2.   Lower  bounds  for  the  CRCW  PRAM(l) 

Theorem  2.1:  Let  M  be  a  CRCW  PRAM(l)  that  computes  a  fc-sensitive  everywhere 

function/  in  time  T.  Then  T  =  Cl{Vk). 

Corollary  2.1:  Let  M   be  a  CRCW  PRAM(l)  that  computes  the  Parity,   Majority 

(Sum,  Max)  function  on  n  bits  (integers)  in  time  T.  Then  T  =  n(Vn). 

Proof:   Parity,   sum   and   max   are  n-sensitive   everywhere.     Majority  is    ln/21- 

sensitive  everywhere.    ■. 

Let  us  informally  discuss  the  difficulties  we  are  facing  in  trying  to  prove 
theorem  2.1.  Consider  the  behaviour  of  the  machine  in  time  period  t .   There  are 
two  possible  cases: 
Case  1:  No  processor  writes  into  C  (s*  =s' "'). 


Case  2:  Some  processor  (sayp;)  writes  into  C  (s^  i^s^    '). 

We  have  to  analyze  the  information  that  is  transferred  in  each  case.  Con- 
sider first  case  1.  As  Cook  and  Dwork  [CD-82]  point  out,  information  is 
transferred  in  this  case,  namely  the  information  that  nobody  wrote.  They  show 
how  this  information  can  be  used  in  an  algorithm  for  the  OR  function,  that  is  fas- 
ter than  the  obvious  one.  The  way  they  keep  track  of  this  elusive  information  is 
heavily  based  on  the  fact  that  their  model  does  not  allow  simultaneous  write 
access  to  the  same  memory  cell.  (Indeed,  their  lower  bound  does  not  hold  for 
the  CRCW  PRAM).  As  our  model  allows  simultaneous  write  access,  we  had  to 
choose  an  approach  which  is  different  from  theirs. 

The  information  that  is  transferred  in  case  2  seems  even  more  slippery.  We 
know  what  was  written  into  C,  and  in  addition  we  know  that  no  processor  with 
serial  number  smaller  than  j  tried  to  write.  (Note  that  as  S  may  be  infinite,  the 
writer  can  encode  its  serial  number  in  the  symbol  it  writes).  This  case  is  much 
simpler  in  the  exclusive  write  model,  since  there,  if  someone  writes,  there  can 
be  no  other  processor  that  tries  to  write! 

At  this  point  we  need  some  notation.  Let  1  denote  the  (non  empty)  set  of  all 
possible  inputs  (the  domain).  Fix  a  time  period  t  and  let  ^  =  s°s^  ■  ■  s*~'  be  the 
string  of  successive  symbols  in  C  in  time  periods  0,1 t-l.  /9  is  called  the  his- 
tory through  time  t.  Denote  by  l^Cl  the  subset  of  inputs  that  have  history  ^ 
through  time  t. 

Our  analysis  will  be  based  on  the  observation  that  cases  1  and  2  consist 
each  of  two  subcases.  Fix  /9,  a  history  through  time  t . 

Case  la;  There  is  no  input  in  I^  for  which  some  processor  writes  at  time  t. 
Case  lb:  There  is  an  input  in  Ip  for  which  some  processor  writes  at  time  t . 
Case  2a:  There  is  no  input  in  I^  for  which  some  processor  with  smaller  serial 
number  than  j  writes  at  time  t . 

Case  2b:  There  is  an  input  in  I^  for  which  a  processor  vrith  smaller  serial  number 
than  j  writes  at  time  t. 

It  turns  out  that  cases  la  and  2a  are  simple  to  analyze.  Intuitively,  in  case 
la  no  new  information  is  transferred  as  /9  itself  contains  the  information  that  no 
one  will  write  at  time  t .  Similarly,  in  case  2a,  /9  contains  the  Information  that  no 
processor  with  a  smaller  serial  number  than  the  writer  could  have  written,  so 
the  only  new  piece  of  information  is  the  new  symbol  in  C,  s* . 


Now,  rather  than  confronting  the  elusive  information  that  is  transferred  in 
cases  lb  and  2b,  we  avoid  (or  circumvent)  it,  and  hence  coin  the  name  circum- 
vention for  this  technique.  Showing  that  we  can  restrict  ourselves  to  the  "easy 
to  analyze"  cases  is  the  heart  of  our  argument. 

Let  l{\{ii,yi).  ■  ■  ■  .(ij .1/1  )i)=|  arel  I  ^1,=!/;.  1^;'^^  !  denote  the  set  of  all 
inputs      (n-tuples)      whose      projection      on      the      i -tuple       (^i.^s.  .'k)      is 

(Vi.ys.  .]/{)• 

Remark:  We  switch  here  from  qualifying  inputs  by  their  history  ("range" 
qualification)  to  qualifying  them  by  their  values  at  given  coordinates  (domain 
qualification).  This  jrields  a  simpler  and  more  intuitive  proof  than  our  original 
one  which  used  range  qualification.  However,  we  believe  that  range  qualification 
is  more  powerful,  and  that  it  may  be  used  to  prove  lower  bounds  when  domain 
qualification  fails. 

The  following  iterative  definition  will  generate  an  "easy  to  analyze"  set  of 
inputs,  i.e.  inputs  for  which  cases  lb  and  2b  never  occur.  For  every  t,  D^  will 
contain  pairs  of  "fixed"  input  positions  and  their  values,  and  £7*  =!(£''). 

Let  £'°=I  and  D°=(p.  Consider  time  period  t  and  define  £"*,/?'  according  to 
the  following: 

Case  1:  There  is  no  processor  jDj  and  no  input  xeE^'^  such  that  Pj  writes  on  x  at 
time  t.  Then 
E*  -  E*-\  Z?'  «-  D^-\ 

Case  2:  There  is  a  processor  p^-  and  an  input  xg£^"'  s.t.  Pj  writes  on  x  at  time  t . 
Letpj  and  y  €£■'"'  be  so  that  pj  writes  on  y  at  time  t,  and  I  is  the  smallest  serial 
number  of  any  processor  that  writes  at  time  t  on  any  input  in  £^"'.  Let 
ii.'^a.  ■  •  ■  .^u  and  yi.'Vig-  '  Vi^  be  the  sets  of  input  cells  emd  their  contents 
(respectively)  that  were  read  by  p,  up  to  time  t.  (Clearly  U'^O-  Then 
E*  -£^-'nI(|(i,.v,,).  .{iu.yOl)- 
/?*^Z?'-'uKi..yt,).  .(^u.l/v,)^ 

It  is  easy  to  see  that  for  every  O^t'^T 

(1)  \D^\<\D^'^\+t.  |i?°|=Oandhence  \  D^  \^t{t +  l)/ 2. 

(2)  E*=liD').  D*-'qD'.  E'CE'-K 

(3)  E*jt<t>. 


In  particular  we  have: 
Lemma  2.1:  E"^ jt (p  and  |  Z7^|^r(r+l)/2 

Remark.  The  definition  above  generates  a  set  E'^  of  "easy  to  analyze"  inputs, 
regardless  of  the  function  being  computed.  Therefore  we  believe  that  this  tech- 
nique can  be  used  to  prove  lower  bounds  for  the  coniputation  of  other  functions 
in  this  model. 

Lemma  2.2:  Let  M  be  an  CRCW  PRAM(l)  computing  a  function  /,  and  let  £'^  be 
defined  as  above  for  M.  Then  for  every  x.  y&E^,  /(x)=/(t/). 

A  rigorous  proof  of  this  lemma  is  given  in  the  appendix.  The  idea  is  to  show 
inductively  on  t,  that  any  processor  which  writes  at  time  t  on  some  input  in  £'^, 
will  have  exactly  the  same  computation  through  time  t  on  every  input  in  E^ . 
Proof  of  theorem  2.1:  Recall  that  M  computes  a  ib -sensitive  everywhere  function 
/  in  time  T.  Suppose  that  T{T+l)/2  <k.  Then  |Z?^|<A:.  and  so  by  definition 
2.3,  there  must  be  inputs  x  and  y  in  F^  s.t.  f  {x)^f  {y).  This  contradicts  lemma 

2.2.  Therefore  7'{  7  + 1)/  2  3^  /b ,  so  r  =  n(V;G' ).    • 

2.3.  Lower  Bounds  for  the  CREW  PRAM(l) 

Consider  the  OR  function  of  n  bits.  As  mentioned  earlier,  the  OR  is  just  1- 
sensitive  everywhere,  so  the  results  in  the  previous  subsection  imply  only  a  con- 
stant time  lower  bound  for  it  on  the  CRCW  PRAM(l).  Indeed,  there  is  a  two-step 
algorithm  for  the  OR  on  this  model  as  follows.  In  the  first  step,  the  common 
memory  cell  C  is  initialized  with  "O".  In  the  second  step,  a  processor  Pi  reads 
the  ith  input  position  and  writes  a  '1'  into  C  ifT  the  value  it  read  was  '  1". 

It  is  clear  why  this  edgorithm  is  not  valid  for  a  CREW  PRAM.  Note,  however, 
that  if  the  domain  consists  only  of  inputs  which  have  at  most  one  position  con- 
taining a  '1',  a  write  conflict  cannot  occur,  and  the  algorithm  is  valid  for  the 
CREW  PRAM.  For  this  reason  we  will  restrict  ourselves  here  to  functions  with  a 
full  domain  (i.e.   I=E").  The  mean  result  in  this  subsection  is: 

Theorem  2.2:  Let  N  be  a  CREW  PRAM(l)  that  computes  a  fc  -sensitive  somewhere 
functions  in  Ume    T.  Then  7  =  n(A:'''3) 

Corollary  2.2:  If  5  is  the  OR  function  on  n  bits,  then  T-^{n^^^) 

In  a  earlier  version  of  this  paper  we  conjectured  that  the  lower  bound  of 
corollary  2.2  can  be  improved  to  7'=n(VTr).  This  was  recently  proved  by  Beame 
[B-83].  In  fact,  he  proved  the  following  stronger  theorem: 


Theorem  2.3  (Beame):  Let  N  be  a  CRCW  PRAM(l)  that  computes  a  function 

g  :  |0,lj"  -♦  {0,lj  in  time  T.  If  there  exists  an  input  eel  s.t.  ||iel  :  g{x)=g{e)\\ 

^  |I|/r.  then  r=n(VIog^). 

It  immediately  follows  that: 
Corollary  2.3  (Beame):  U  g  is  the  OR  function  on  n  bits,  then  T  =  Q{-^). 

The  proof  of  Theorem  2.3  has  the  same  structure  as  that  of  Theorem  2.2. 
However,  while  we  focus  on  the  sensitivity  of  inputs  in  the  lower  bound  argu- 
ment, Beame  focuses  on  a  different  parameter,  namely  the  number  of  inputs 
with  the  same  image.  His  proof  is  of  independent  interest,  and  we  include  it  in 
the  appendix. 

We  return  to  the  proof  of  Theorem  2.2.  The  idea  is  to  use  the  framework  of 
the  previous  subsection,  namely  to  construct  a  set  of  inputs  E^ ,  and  show  that 
for  the  computed  function  to  be  constant  on  f"^,  T  must  be  large.  This  task  was 
relatively  easy  for  everywhere  sensitive  functions,  since  we  did  not  have  to  worry 
about  the  contents  of  E''' ,  as  every  input  is  sensitive.  To  use  the  sensitivity  of 
inputs  in  a  somewhere  sensitive  function  in  a  similar  argument  we  must  make 
sure  that  E^  contains  at  least  one  sensitive  input.  This  motivates  the  following 
inductive  definition  of  the  sets  D^  ,E* . 

Let  g  :  I  -»  E  be  the  function  being  computed  and  e  =e,,e2,  ,e„el  be  a  A:- 

sensitive  input  w.r.t.  g.  Set  £*=I  and  D^=4>.  Consider  time  period  t  and  define 
£•*,£>*  as  foUows: 

Case  1:  There  is  no  processor  p^-  and  no  input  leF'"'  such  that  Pj  writes  on  x  at 
time  t.  Then 

Case  2:  There  is  a  (unique)  processor  Pj  that  writes  on  ee£^"'  at  time  t.  Let 
ii.iz'  ■  '  '  .'N*    and   e^  ,e,^,  e.     be   the   sets   of  input  cells   emd   their   contents 

(respectively)  that  were  read  by  Pj  up  to  time  /  .  (Clearly  u<:t).  Then 
:?'  -  Z?*-i  u  |(i„e,,),  ,{^,e^)\. 

£"-£•*-'  n  I(Ki,.e,,).  ,(i„e^)i). 

Case  3:  There  exists  lef''"',  x^e  s.t.  some  Pj  writes  on  x  at  time  t,  but  no  pro- 
cessor writes  on  e  at  time  f .  Let  /?o  be  a  set  of  positions  s.t.  if  yeiT*^'  emd  i/j=e, 
for  all  iG./?f|,  then  no  processor  writes  on  y  at  time  t.  In  this  case  we  fix  the  posi- 
tions Rn  with  values  of  e  : 


It  is  easy  to  see  Inductively  that  e  ££**   for  all  f ,  and  so  e  ef"^.  Our  main 
problem  is  to  obtain  an  upper  bound  on  \Ri\- 
Lemma  2.3:  For  every  t.  \I?*q\  ■^t{t  +  l)/2 

This  lemma  is  the  heart  of  the  lower  bound.  Since  the  proof  is  long,  it  is 
deferred  to  the  appendix. 

Lemma  2.4:  For  every  t.  \D'\^t{t  +  \){t  +2)/  6  and  e  eE* . 
Proof:  By  simple  induction  on  t .  ' 
Lemma  2.5:  For  every  x.yei:^.  gix)=g{y). 
Proof;  Exactly  the  same  as  the  proof  of  lemma  2.2.  • 

Proof  of  theorem  2.2:  Recall  that  N  computes  a  A: -sensitive  somewhere  function 
g  in  time  T.  Suppose  T{T  +  l){T-i-2)/ 6  <k.  Then  by  lemma  2.4  \D'''\<k.  Since 
e&E^,  by  definition  2.3  there  must  be  a  yeE'''  s.t.  giy)i^g{e),  which  contradicts 
lemma  2.5.  ■ 

2.4.   Arbitrary  CommunicaLion  Width 

What  happens  when  the  communication  width  is  larger  than  1?  The  CRCW 
PRAM(m)  is  defined  similarly  to  the  CRCW  PRAM(l),  only  now  there  are  m  com- 
mon memory  cells  C(l),  C(2).  •  .C(m)  to  which  the  processors  have  con- 
current read/write  access.  In  a  similar  fashion  the  CREW  PRAM(7n)  can  be 
defined.   Our  results  are  summarized  in  the  following  theorem. 

Theorem  2.4: 

Let  M  be  a  CRCW  PRAM{m)   that  computes  a  Jfc -sensitive  everywhere  function 


/:I(£:E'')-E  in  time  T.   Then  r=n(VA:/m). 

In  particular,  if /ejParity,  Majority.  Sum,  Maxj.  T=Q(^rL/  m  ). 

Let  N  be  a  CREW  PRAM(7n)  that  computes  a  A: -sensitive  somewhere  function 

p:E"-S.  Then  T=Qiik/ m)^^). 

In  particular,  if  ye{ AND,  OR],  7'=n{(n/m)»/3) 

The  only  diflficulty  in  extending  our  technique  to  prove  theorem  2.4  is  in  the 
definition  of  the  "easy  to  analyze"  cases.  For  example,  one  can  construct  a 
machine  for  which  the  following  happens:  There  are  inputs  for  which  both  C(l) 
and  C(2)  are  written  into.  However,  if  we  choose  an  input  for  which  the  smallest 
numbered  processor  writes  into  C(l).  no  one  will  write  into  C(2)  and  vice  versa. 
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We  overcome  this  difficulty  by  conceptually  serializing  the  write  access  into 
different  cells  as  follows:  Each  time  unit  t  is  sliced  into  m  slices,  so  that  in  the 
ith  slice  only  cell  C{i)  may  be  written  into.  Then,  at  the  ith  slice  of  time  period 
t  we  can  refer  not  only  to  the  contents  of  all  cells  at  previous  time  periods,  but 
also  to  the  contents  of  cell  1  to  i-1  at  time  period  t.  (Note  that  the  machine  is 
not  affected  by  this  conceptual  slicing.  Indeed,  it  shows  that  our  results  hold 
even  in  a  stronger  model  that  edlows  the  processors  to  access  all  common 
memory  cells  at  each  time  unit).  As  a  result  we  eire  able  to  define  sets  £"*  and 
D  ,  O^t^T.  l:Si<m  inductively  in  a  similar  fashion  to  the  previous  subsections 
for  the  CRCW  PRAM(m)  and  the  CREW  PRAM(7n)  respectively.  The  only 
refinement  is  that  instead  of  defining  E*  from  £"'"',  we  define  E^  from  K*"^  when 
i>l,  and  £^' from  £'^*~'^'". 

The  analysis  of  the  previous  subsections  carries  through  in  a  straightfor- 

T  T 

ward  manner  w.r.t  the  final  sets,  E  "*  and  D  ".  This  includes  the  proof  of  the  fol- 
lowing two  lemmas  and  the  conclusion  of  the  theorem  from  them. 
Lemma  2.6: 

In  the  CRCWPRAW(m),  \d'^"'\  ^  Tn{T  +  \)T/2. 
In  the  CREWPRAW(m),  l/?^"]  ^  mT{T  +  l){T+2)/ 6. 
Lemjna  2.7: 

In  the  CRCWPR/iLM(7n),  for  every  x.yeE"^"".  f  (x)=f  (y). 

In  the  CREWPRAM(m),  for  every  x.y  ^e"^"".  g  {x)=g{y). 

We  conclude  this  subsection  with  two  observations: 

1)  The  ideas  outlined  above  can  be  used  to  extend  also  Beame's  theorem 
(Theorem  2.3)  for  arbitrary  communication  width,  as  follows. 

Theorem  2.5:  Let  N  be  a  CRCW  PRAM(m)  that  computes  a  function 

g  :  |0,lj'*  -►  \0,ll  in  time  T.  If  there  exists  an  input  eel  s.t.  |jxel  :  g{x)=g(e)l\ 

<  |I|/r.  then  r=n(V(log2r )/m  ). 

In  particular,  if  y  is  the  OR  function,  then  r=n(Vn/m  ) 

2)  Two  other  concurrent-write  models  of  parallel  computation  that  appeared  in 
the  literature  ([SV-Bl],  [ShV-82]).  They  differ  from  our  CRCW  PRAM  in  the  way 
they  resolve  write  conflicts,  in  the  first  all  processors  that  access  the  same 
memory  location  should  write  the  sajTie  value.  In  the  second  there  is  no  such 
restriction,  but  we  do  not  know  in  advance  which  processor  succeeds  in  writing. 
We  conclude  this  section  by  mentioning  that  those  two  models  are  weaker  than 


ours,  and  therefore  our  results  for  the  CRCW  PRAM  hold  for  them  as  well. 

3.   Upper  Boundis 

All  upper  bounds  can  be  achieved  in  the  weakest  version  of  a  PRAM,  namely 
the  Exclusive-Read  Exclusive-Write  PRAM  (ERCT  PRAM).  It  is  similar  to  the  CREW 
PRAM,  only  that  in  this  model  any  simultaneous  access  of  a  shared  memory  cell 
is  forbidden.  The  algorithms  are  simple  and  will  be  described  informally.  They 
wUl  be  given  only  for  the  problem  of  summing  n  numbers.  It  is  easy  to  see  that 
they  hold  for  computing  any  associative  function. 

Consider  first  the  EREW  PRAM(l)  model.  The  n  numbers  aj.aa.  .a^  are 
initially  stored  in  the  read-only  input  tape.  Let  Lj  be  a  local  memory  cell  of  pro- 
cessor Pj.  and  C  is  the  common  memory  cell.  The  algorithm  is  described  in  Fig- 
ure 1. 


Time 

Pi 

P2 

P3 

Pa 

1 

La-aa 

La^a^ 

Z,4«-a7 

2 

La^-La+ag 
C^C+Lz 

Lg-Lg+tts 

L^^L^+as 

3 

La«-L3+ae 
C*-C+L3 

Li'-Li+ag 

4 

Z,4«-Z,4+aio 
C«-C+Z,4 

Figure  1:  Summation  with  one  common  memory  cell 


Clearly,  only  p  =  0{VTr)  processors  are  active  m  this  algorithm,  and  the  sum 
is  computed  in  0(V7r)  Ume.  Since  sequenUal  time  for  summaUon  is  n(n),  a 
straightforward  lower  bound  of  Q{n/p)  exists  for  any  parallel  machme  with  p 
processors.  Hence  the  number  of  processors  is  optimal  up  to  a  constant  factor. 
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Consider  now  the  same  problem  for  the  CRCW  PRAM(77i),  where 
m  =  0(n/log^).  We  show  how  to  achieve  ©(Vn/m  )  time  with  0(VnTn  )  proces- 
sors. The  algorithm  has  two  phases: 

(1)  Partition  the  n  inputs  into  m  subsets  of  size  roughly  n/m  each.  Assign  to 
each  subset  y/n/m  processors  and  one  common  memory  cell.  For  each 
subset  the  sum  is  computed  in  the  respective  memory  cell  using  the  algo- 
rithm above  in  time  0{y/n/Tn  ). 

(2)  Sum  up  the  m  values  in  the  common  memory  using  m  {■^s/n.m  )  processors 
in  O(log  m.)  time  in  the  obvious  way. 

As  before,  the  number  of  processors  used  is  optimal  up  to  a  constant  factor. 
This  upper  bound  establishes  that  our  lower  bound  for  Parity  on  the  CRCW 
PRAM(m)  and  Beame's  lower  bound  for  the  OR  on  the  CREW  PRAM(m)  are  tight. 

We  conclude  by  mentioning  what  is  known  when  the  communication  width  is 
larger  than  the  input  size.  If  the  input  values  eire  taken  from  a  finite  domain,  the 
sum  can  be  computed  in  constant  time  using  exponential  width  and  nuniber  of 
processors.  If  those  two  resources  are  bounded  by  a  polynomial  in  n,  the  best 
upper  bound  known  is  O(log  n/  loglog  n)  [CSV-82]. 

4.   Conclusioiis  and  Open  Problems 

Using  communication  based  arguments  to  prove  lower  bounds  in  Computer 
Science  is  an  old  idea.  The  crossing-sequence  [HU-79]  technique  in  Tunng 
machines  essentially  measures  communication  between  work-tape  cells.  This 
technique  was  extended  to  measure  communication  between  two  halves  of  a 
VLSI  circuit  [Y-81,  LS-81,  PS-82]  and  obtam  Time-Area  trade-offs. 

We  consider  this  paper  to  be  a  first  step  towards  understanding  the  central 
role  played  by  communication  m  efTicient  parallel  computation.  The  view  of  com- 
munication as  a  resource  in  parallel  machines  gives  rise  to  many  questions.  We 
mention  a  few  below. 

(1)  Our  lower  bound  for  the  Or  on  the  CREW  PRAM,  combined  with  that  of  Cook 
and  Dwork,  covers  the  whole  range  of  m.  On  the  other  hand,  the  lower 
bound  for  the  parity  function  on  the  CRCW  PRAM(m)  becomes  trivial  when 
Tn>n.  The  case  where  m  is  only  bounded  by  a  polynomial  in  n  is  of  particu- 
lar interest,  since  a  lower  bound  on  the  time  here  will  give  a  lower  bound  on 
the  depth  of  polynomial  size  Parity  circuits. 


(2)  Consider  parallel  RAMs  in  which  processors  are  allowed  to  be  probabilistic 
or  non-deterministic.  In  the  deterministic  version  of  the  CRCW  PRA_M(l) 
which  we  studied  here,  both  the  Parity  aind  the  Max  functions  have  an 
n(Vn  )  lower  bound  on  the  time.  If  we  allow  non-determinism,  the  maximum 
of  n  numbers  can  be  computed  in  constant  time.  However,  we  conjecture 
that  the  lower  bound  still  holds  for  Parity  even  in  the  non-deterministic 
model. 

(3)  Study  Time  -  Width  -  Processors  trade-offs  for  other  functions. 
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A(^)endix 

Lemma  2.2:  For  every  a:.i/e£'''.  f{x)=f{y). 

Proof  of  lemma  2.2:  We  use  the  following  notation.  For  an  input  icl, 

qiix)  and  s*(x)  eire  respectively  the  state  of  pj  and  the  contents  of  C  in 

time  t  for  the  input  x . 

/?/(x)  =   \pj{qj(x))  I  0^r<t    I   is  the  set  of  input  cells  read  by  Pj    through 

time  t.  Set  R*o{x)=<f>. 

tu  (i)  is  the  index  of  the  processor  that  writes  at  time  t   on  input  x.    If 

there  is  no  such  processor  at  time  t  for  x,  Tij'(x)  =  0. 

W*{x)=   lj/?J"(x)  where  7  =Tx;'"(x),  is  the  set  of  input  cells  read  by  all  writers 
r  =  l 

through  time  t . 

FW*{x)  =  I  im''{x)  I  t^T^T.  \u^{x)^0  j  is  the  set  of  future  writers  from  time 

period  t  on. 

Let  X  and  y  be  elements  in  f"^.  It  is  sufficient  to  show  that  s^(x)=s^(t/).  We 
prove  by  induction  on  t.  that  w* (x)  =  w'iy).  W^{x)  =  W*{y),  s'(x)  =  s'(y).  and 
that  for  every  j^FW*{x)   qj{x)  =  qjiy)  and  Rj{x)  =  Rj(y). 

t=0.  For  every  processor  ;.  q°(x)  =  q°{y)  =  q°.  /?/(x)  =  /?/(t/)  =  0.  Also, 
s°(x)  =  s°iy)  =  bo.  f^V)  =  >V°(y)  =  0  and  w\x)  =  ru°{y)^0. 

t>0.  Assume  the  claim  holds  for  every  r<t.  Let  jeFW*{x)  Let 
i(x)  =p^{g/"'(x)),  i{y)  =  Pj{qj'\y)).  By  the  induction  hypothesis  i{x)  =  i{y) 
and  hence  RJ{x)  =  RJ{y).  Since  j^FW*(x).  RJ{x)qW'^{x)  which  using  x.yeE'^ 
implies  that  ^i(a)='yt(r)-  From  this  and  the  induction  hypothesis  we  get 
gjix)  =  qjiy)-  Let  ffj(i)  =  aj(qj{x))  and  aj{y)  =  o^iqjiy)).  Then  aj{x)  =  aj{y). 
There  are  two  cases  to  consider  now, 

easel:       s*(x)=s'"'(x).       By      the       construction       of       £"'       and       induction, 
s*(t/)=s*{x)=s'-'(x). -^'(x)=-u;'(y)=Oand  W'{x)=  W {y)=W'-\x). 
case  2:  s*(x)i>«s'"'(x).  Let  j=ru'(x).  Again  by  construction  of  £^.  there  can  be  no 
l<3    s.t.    a,(g/{T/))j«'s'-'(i/).    and    since    aj{x)  =  cjj{y)    we    have    ti.' (y  )=;  =Ti;'(x). 
s'(|/)=s<{x).  and  Fy'(y)=W"{i).. 


-  15- 

LemmaS.a  For  every  «,  \Ri  {•<t{t  +  l)/2. 

Proof  trf  lemma  2.3:  Denote  by  Z*  the  set  of  nonnegative  integers,  and  i,j,k,l 

denote    only   positive    integers.    Also,    for   a   subset   Sq\].,2,        ■  .nj    and   inputs 

x.yel. 

x='i/(77xod  5)  means  Xi=yj  for  all  ie5). 

\{S)  =  \x^\\  x^y{Tnod  S)\. 
Claim:  Given  an  integer  t,  a  set  5c|l,2,  ,nj,  a  function  h\{S)^Z*,  and  sets 

5,CU,2.  ■■  ■  .nj  for  every  positive  je/i  (1,(5))  that  satisfy 

(1)  5^n5=0  for  all;. 

(2)  |5j|^f  forall;. 

(3)  /i(e)=0. 

(4)  h.(x)=j  andy=x{TrLod  5, )  implies /i(y)=>. 

(5)  h{x)=j.  h{y)-k  andjVfc  implies  that  there  exists  an  ie5jn5i  s.t.  Xi^yi. 
Then  there  exists  a  set  Rz.\l.2,  ,n\  s.t.  |  /?  |^f  (f +  1)/ 2  and  /i(l,  (5u/?)  =  0. 
Connection  between  the  claim  and  the  lemma.  Recall  that  we  wanted  to  prove 
the  existence  of  t{t  +  l)/2  input  positions  s.t.  fixing  them  with  values  of  e  will 
ensure  that  no  one  writes  at  time  t  in  case  3.  Let  Rj  be  defined  as  in  the  proof  of 
lemma  2.2.  Then  let  5  be  the  set  of  fixed  input  positions  through  time  t, 
(5=Z?'-M,(5)  =  £''->).  Sj=Rj-S  for  all  ;,  and  let  the  function  /i:l,(5)-Z*  be 
defined  by  h{x)-j  U pj  is  the  (unique)  processor  that  writes  on  z  at  time  t ,  and 
h{x)  =  0  if  no  one  writes  on  x  at  time  t .  Let  us  verify  that  properties  (l)-(5)  hold. 

(1)  By  the  definition  of  5j. 

(2)  Is' I  =  \RJ-S\  ^\RJ\  ^t 

(3)  We  deal  here  only  with  case  3,  in  which  no  processor  writes  on  e. 

(4)  Since  ar,i/elB(5)  and  x=|/(77i.od  5,),  x=y{Trwd  Rj).  With  an  almost  identical 
proof  to  that  of  lemma  2.2  we  can  prove  that  qj{x)  =  qj{y),  i.e.  Pj  will  arrive 
at  the  same  state  at  time  t  for  both  inputs  z  and  y.  In  particular,  Pj  will 
write  on  z  if  and  only  if  it  will  write  on  y  at  time  t . 

(5)  Suppose  not.  Then  define  eui  input  z  by  2t=Zi  if  ie5^,  z^=yi  if  ie5;t -5,,  and 
Zi=ei  for  the  remaining  values  of  i.  Clearly  zel,(5)  =  £"'"'.  Therefore  both 
Pj  and  pt  write  at  time  t  on  z.  contradicting  the  definition  of  the  CREW 
PRAM. 
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Now  we  can  take  Rq  =/?,  which  completes  the  proof  of  the  lemma.  ■ 

Proof  at  Claim:  The  proof  is  by  induction  on  t . 

t=0.  In  this  case  /i(I,(5))  =  }0}.  Otherwise,  if  for  some  xel,(5)  there  exists  j>0 

s.t.  h(x)=j,  then  by  (4)  also  h{e)=j.  contradiction  to  (3). 

t>Q.  If  /i{l,(5))  =  fOj  we  are  done.  Assume  that  for  some  xe\,{S),  h{x)=l>0.  Set 

S'  =  SuSi,    let    /i'    be    the    restriction    of    h    to    1,(5"),    and    S'j=Sj-Si    for    all 

;e/i'(I,(5')).   Then  we  have  the  following: 

(1')   S'nS'j  =  <t)  for  all;'.  Qear. 

(2')  \S'j\i^t-l  for  all  je/i'(I,(5')).  Since  Z.;el,(5)),  by  (5)  S^nSiTttf,.  and  there- 
fore I  5;  I  =  I  5,  -5, 1  ^  I  Sj  \-l<t  -1. 

(3')    eel,(5')  and  h(e)=0.  Clear. 

(4')  hXx)=j.y=x{mod  Sj)  implies  /i"(y)=j.  Since  x.T/eI,(5').  y=x  =  e{Tnod  Si) 
and  therefore  v^x (mod  Sj).  Hence;  =  /i'(x)  =  /i(x)  =  h{y)  =  h'{y). 

(5')  hXx)=j.  h'(y)=k,jjtk  implies  that  there  is  an  iG5,n5^  s.t.  x^j^yi-  Since 
h{x)=j ,  h(y)=k  there  must  be  such  an  i  in  SjnSic-  However,  since 
x=y  =e{mod  Si)  i  must  belong  to  SjC^Sj^. 

By  the   induction  hypothesis,    there   exists   a  set   R'   s.t.     \  R'\-^{t -l)t/ 2   and 

/i'(I.(5'U/?'))=0.        Set        R-RuSi.         Then        clearly        | /?  |<f  (f  + 1)/ 2        and 

h(UiSuR))  =  h(l^(S-uR'))  =  /i(U(5'u/?'))  =  0.  - 

Tlieorem  2.3:    Let  N  be  a  CREW  PRAM(l)  that  computes  a  function  g  in  time  T 

such    that    3eel    (    I=i0.1j")     for    which     |  |xel|g  (x)=3(s )  j  |  ^  1 1| /r     Then 

T=Q(-sA^i^). 

Set  £*=!  and /^  =  I\  £'°  =  0.    Consider  time  period  r    For  any  ;  let  Pj  be  the 

set  of  input  positions  read  by  processor  Pj  up  to  time  t   and  define  E*   and  its 

complement  F*  as  follows: 

Case  1:  There  is  no  processor  p^  and  no  input  x  ££"'"'  such  that  p^  writes  on  x  at 
time  f:  Then  £■*  *- E'-\J^  <- F^'K 

Case  2:  No  processor  writes  on  e    at  time  t   but  there  is  an  xe£^"'  such  that 
some  Pj  writes  on  x  at  time  t : 

For  every  input  xeE"'"'  which  causes  a  processor  Pj  to  write  at  time  t  define 
C*=K\{i.x,)\iePji). 
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Each  Cj  is  specified  by  the  values  in  at  most  t  input  cells  since  \Pj\  ■<t.  It 
is  clear  that  any  x  €£"*"'  which  causes  a  virrite  at  time  t  is  in  some  Cy.  Also  any 
y  e£^~'n^  will  cause  a  write  at  time  t  since  processor  Pj  at  time  t  will  not  be 
able  to  distinguish  y  from  x.  Thus  if  we  eliminate  the  elements  of  these  "cubes" 
from  F*"'  no  writes  will  occur  at  time  t. 

The  "cubes"  also  satisfy  an  additional  property.  If  Clr,C*  /  0  then  their 
shared  specifying  positions  must  agree  in  value.  Therefore,  if  Cj  ;<  Cj  then  Cj 
and  Cy  must  be  specified  by  different  input  cells  and  so  correspond  to  different 
processors.  It  follows  then  that  Cir\Cl  c  I\  £"*''  =  F*"'  otherwise  there  would  be 
a  simultemeous  write  which  is  not  allowed. 
Thus  if  we  designate  the  distinct  cubes  as  JCJJ  then 

F**-F*-^u{uCi)  whereViTtj,  C}nC:JQF'-\ 

Case  3:  There  is  a  (unique)  processor  Pj  that  writes  on  e^E*'^  at  time  t: 
Then  we  require  that  the  input  agree  with  e  in  the  positions  of  PJ.  We  may 
regard  this  as  requiring  that  the  input  be  in  the  cube  which  is  the  subset  of  the 
input  specified  by  these  |P||^f  values.  Equally  well  this  may  be  regarded  as 
excluding  from  the  input  all  values  which  are  in  the  cubes  specified  by  the  other 
2^-1  possible  settings  of  vsdues  in  these  positions.  If  we  call  these  excluded 
cubes  \C^l  as  in  case  2,  it  is  immediate  that  \?.i<;.  C^  r\  Cj  =  (p  <z  F*'^. 
Then  as  in  case  2  we  have 

E**'E*'^\(uCi)  and 


2.8:    For  any  t^O  and  any  "cube"  C*  which  is  specified  by  at  most  s  cells 
of  the  input  3  an  integer  r  such  that  |  C  nF*  |  =  — ^].  '^  . 

2"     »' 
Proof:  By  induction  on  t 
t  =  0:  F*  =  if)  so  the  claim  is  true  with  r  =  0. 
Assume  claim  for  t  —1: 

F*  =  F*"*+2(C?\F'-')  smce  N«^;.  CfnC^cF*''  (we  use  additive  notation  for  dis- 
joint union). 

Therefore    |CoF*|  =  |  C«n(F'-'  +  2(q<\F'-')| 

i 
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=  |Cnf^-'|  +2l(CnQ')\^'"'l 

i 

=  \CnP~'\  +i:(|C^C?|  -  |(C«nqf)nF«-'|) 
t 

If  we  only  sum  over  non-empty  intersections  then  C^  nC^  is  a  subset  of  the  input 
which  restricts  the  input  only  by  specifying  input  positions  and  which  is 
specified  by  at  most  s  +  t  of  them.  Thus  we  may  designate  C^*^'  =  CnC^.  There- 
fore 

\CnFi\  =  |cn/"->|  +  Sdc?""'!  -  iq-^'n/^^'l) 

i 

-    Pill     .  ^gii^i     ^    ^xl^i 

2'    ^ 


tu-i)     ■  S^Ti T, '  tU-\)      where p.gi.rt  are  integers. 


This  follows  by  the  inductive  hypothesis  for  the  first  and  last  terms  and  because 
of  the  form  of  the  middle  terms. 

Since  all  of  the  denominators  divide  2         ^       the  claim  holds  for  t    and  the 
Lemma  is  proved.  • 

Lemma2.9:   e  e^"^  and  VieiT^.y  (x)=g{e).   - 

Proof  of  Theorem  2.3:    if  we  apply  Lemma  2.8  with  s=0  then  C  =1  and  we  see 

that    |F^|    is  an  integral  multiple  of       J^^'^,)  .    Since  £'^  =  I\F^  it  follows  that 

2      2 
I E^  I  is  also  a  multiple  of  this  number.    Now  e  eE^  so  we  have  |  ^"^  j  >  0  and  thus 

!•£''"  I  ^      T(T+i)  ^  ^1  •     ^y    °"^    assumption    on   g    and    by   Lemma   2.9   we    need 

2     2 

,  lilllL  ^ 

|£'''|  ^  ^I|.    Therefore  2      ^       ^  r  and  so  7  =  n(\/rog^  ).    ■ 
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ABSTRACT 

A  new  technique  for  proving  lower  bounds  for  parallel  compu- 
tation is  introduced.  This  technique  enables  us  to  obtain,  for  the 
first  time,  non-trivial  tight  lower  bounds  for  shared-memory 
models  of  parallel  computation  that  allow  several  processors  to 
have  simultaneous  access  to  the  same  memory  location. 
Specificedly,  we  use  a  concurrent-read  concurrent-write  model  of 
parallel  computation.  It  has  p  processors,  each  has  access  to  a 
common  memory  of  size  m  (also  called  communication  width  or 
width  in  short).  The  input  to  the  problem  is  located  in  an  additional 
read-only  portion  of  the  common  memory. 

For  a  wide  variety  of  problems  (including  parity,  majority  and 
summation)  we  show  that  the  time  complexity  T  (depth)  and  the 
communication  width  m  are  related  by  the  trade-off  curve 
ml^  =  n(7i),  (where  n  is  the  size  of  the  input),  regardless  of  the 
number  of  processors.  Moreover,  for  every  point  on  this  curve  with 
m  =  O(n/log^  n)  we  give  a  matching  upper  bound  with  the 
optimal  number  of  processors. 

We  extend  our  technique  to  prove  ttiT^  =  Ci{n)  trade-off  for  a 
class  of  "simpler"  functions  (including  Boolean  Or)  on  a  weaker 
model  that  forbids  simultaneous  write  access.  We  also  state  and 
give  a  proof  of  a  new  result  by  Beame  [B-83]  that  achieves  a  tight 
lower  bound  for  the  OR  in  this  model,  namely  mT^  =  Q{n).  These 
results  improve  the  lower  bound  of  Cook  and  Dwork  [CD-82]  when 
communication  is  limited. 
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